Adding multi file prediction with Array Job by Thomasbush9 · Pull Request #1 · KempnerInstitute/boltz

Thomasbush9 · 2025-08-28T18:21:04Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for multi-file protein folding predictions using SLURM array jobs. The enhancement allows batch processing of multiple FASTA files by automatically splitting them into chunks and submitting parallel prediction jobs.

Introduces a bash script to split input files into chunks and submit array jobs
Adds a SLURM array job script for processing individual file chunks
Updates documentation to explain the new multi-file prediction workflow

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
split_and_pred.sh	Main orchestration script that splits input files into chunks and submits SLURM array jobs
single_prediction_array.slrm	SLURM array job script that processes individual file chunks through the protein folding pipeline
README.md	Updated documentation explaining the new multi-file prediction workflow and usage instructions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-12T13:15:46Z

kempner_workflow/protein_fold_gpu/split_and_pred.sh

+# Usage: ./split_and_submit.sh INPUT_DIR N OUTPUT_PARENT_DIR
+# Example: ./split_and_submit.sh /data/images 5 /data/jobs


The usage comment references 'split_and_submit.sh' but the actual filename is 'split_and_pred.sh'. Additionally, the example uses 'images' directory which is misleading for a protein folding workflow that expects FASTA files.

Suggested change

# Usage: ./split_and_submit.sh INPUT_DIR N OUTPUT_PARENT_DIR

# Example: ./split_and_submit.sh /data/images 5 /data/jobs

# Usage: ./split_and_pred.sh INPUT_DIR N OUTPUT_PARENT_DIR

# Example: ./split_and_pred.sh /data/fasta_files 5 /data/jobs

Copilot · 2025-09-12T13:15:46Z

kempner_workflow/protein_fold_gpu/single_prediction_array.slrm

+#SBATCH --mail-user=thomasbush52@gmail.com
+# Use array-aware log names to avoid clobbering:
+#SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out
+
+
+set -euo pipefail
+


Hardcoded email address should not be committed to the repository. This should be removed or made configurable to prevent other users from receiving unintended notifications.

Suggested change

#SBATCH --mail-user=thomasbush52@gmail.com

# Use array-aware log names to avoid clobbering:

#SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out

set -euo pipefail

#SBATCH --mail-user=${MAIL_USER}

# Use array-aware log names to avoid clobbering:

#SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out

# Set MAIL_USER environment variable to your email before submitting, e.g.:

# export MAIL_USER=your.email@domain.com

set -euo pipefail

# Ensure MAIL_USER is set

: "${MAIL_USER:?MAIL_USER environment variable not set. Please set it to your email address before submitting.}"

Copilot · 2025-09-12T13:15:46Z

kempner_workflow/protein_fold_gpu/README.md

+```{bash}
+source slrm_scripts/multi_pred.sh  INPUT_DIR N OUT_DIR


The documentation references 'slrm_scripts/multi_pred.sh' but the actual script is named 'split_and_pred.sh' and located in the current directory, not in a 'slrm_scripts' subdirectory.

Suggested change

```{bash}

source slrm_scripts/multi_pred.sh INPUT_DIR N OUT_DIR

```bash

source ./split_and_pred.sh INPUT_DIR N OUT_DIR

Copilot · 2025-09-12T13:15:47Z

kempner_workflow/protein_fold_gpu/README.md

+- Divide the input dir files into n sets, generate .txt containing the path to each .fasta (one per set)
+- create an out_dir/chunks_timestamp/ directory where the predictions will be stored
+
+- start N jobs launching the script: slrm_scripts/single_prediction.slrm n times (you can modify the resource of each job by modifying this script)


The documentation references 'slrm_scripts/single_prediction.slrm' but the actual script is named 'single_prediction_array.slrm' and located in the current directory.

Suggested change

- start N jobs launching the script: slrm_scripts/single_prediction.slrm n times (you can modify the resource of each job by modifying this script)

- start N jobs launching the script: ./single_prediction_array.slrm n times (you can modify the resource of each job by modifying this script)

multi prediction boltz

4d3e33d

dmbala requested a review from Copilot September 12, 2025 13:14

Copilot AI reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Adding multi file prediction with Array Job#1

Adding multi file prediction with Array Job#1
Thomasbush9 wants to merge 1 commit intoKempnerInstitute:mainfrom
Thomasbush9:feature_branch

Thomasbush9 commented Aug 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Usage: ./split_and_submit.sh INPUT_DIR N OUTPUT_PARENT_DIR
		# Example: ./split_and_submit.sh /data/images 5 /data/jobs

		```{bash}
		source slrm_scripts/multi_pred.sh INPUT_DIR N OUT_DIR

	- start N jobs launching the script: slrm_scripts/single_prediction.slrm n times (you can modify the resource of each job by modifying this script)
	- start N jobs launching the script: ./single_prediction_array.slrm n times (you can modify the resource of each job by modifying this script)

Comments

Conversation

Thomasbush9 commented Aug 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant